16 research outputs found
AutoBiasTest: Controllable Sentence Generation for Automated and Open-Ended Social Bias Testing in Language Models
Social bias in Pretrained Language Models (PLMs) affects text generation and
other downstream NLP tasks. Existing bias testing methods rely predominantly on
manual templates or on expensive crowd-sourced data. We propose a novel
AutoBiasTest method that automatically generates sentences for testing bias in
PLMs, hence providing a flexible and low-cost alternative. Our approach uses
another PLM for generation and controls the generation of sentences by
conditioning on social group and attribute terms. We show that generated
sentences are natural and similar to human-produced content in terms of word
length and diversity. We illustrate that larger models used for generation
produce estimates of social bias with lower variance. We find that our bias
scores are well correlated with manual templates, but AutoBiasTest highlights
biases not captured by these templates due to more diverse and realistic test
sentences. By automating large-scale test sentence generation, we enable better
estimation of underlying bias distribution
Calendar.help: Designing a Workflow-Based Scheduling Agent with Humans in the Loop
Although information workers may complain about meetings, they are an
essential part of their work life. Consequently, busy people spend a
significant amount of time scheduling meetings. We present Calendar.help, a
system that provides fast, efficient scheduling through structured workflows.
Users interact with the system via email, delegating their scheduling needs to
the system as if it were a human personal assistant. Common scheduling
scenarios are broken down using well-defined workflows and completed as a
series of microtasks that are automated when possible and executed by a human
otherwise. Unusual scenarios fall back to a trained human assistant who
executes them as unstructured macrotasks. We describe the iterative approach we
used to develop Calendar.help, and share the lessons learned from scheduling
thousands of meetings during a year of real-world deployments. Our findings
provide insight into how complex information tasks can be broken down into
repeatable components that can be executed efficiently to improve productivity.Comment: 10 page
Can You Label Less by Using Out-of-Domain Data? Active & Transfer Learning with Few-shot Instructions
Labeling social-media data for custom dimensions of toxicity and social bias
is challenging and labor-intensive. Existing transfer and active learning
approaches meant to reduce annotation effort require fine-tuning, which suffers
from over-fitting to noise and can cause domain shift with small sample sizes.
In this work, we propose a novel Active Transfer Few-shot Instructions (ATF)
approach which requires no fine-tuning. ATF leverages the internal linguistic
knowledge of pre-trained language models (PLMs) to facilitate the transfer of
information from existing pre-labeled datasets (source-domain task) with
minimum labeling effort on unlabeled target data (target-domain task). Our
strategy can yield positive transfer achieving a mean AUC gain of 10.5%
compared to no transfer with a large 22b parameter PLM. We further show that
annotation of just a few target-domain samples via active learning can be
beneficial for transfer, but the impact diminishes with more annotation effort
(26% drop in gain between 100 and 2000 annotated examples). Finally, we find
that not all transfer scenarios yield a positive gain, which seems related to
the PLMs initial performance on the target-domain task.Comment: Accepted to NeurIPS Workshop on Transfer Learning for Natural
Language Processing, 2022, New Orlean
Designing Engaging Conversational Interactions for Health & Behavior Change
Thesis (Ph.D.)--University of Washington, 2021The recent popularity of chat and voice-based conversational interactions fueled by advances in natural language processing (NLP) has opened up opportunities for re-imagining user interactions in health & behavior change as conversational experiences. Prior work has indicated that a well-designed conversational approach can be more engaging, motivating, natural, personal, and understandable. It can also mimic the properties of some of the most successful human-led interventions, such as coaching and motivational interviewing. However, designing conversational interactions poses numerous challenges. Efficiently creating conversational content that is diverse, relevant for the context, and sounds natural is challenging. Furthermore, balancing the still limited AI capabilities with user expectations requires careful problem scoping and other design considerations. Finally, the mechanisms in which a successful conversational interaction can help improve user engagement are still not well explored. In this dissertation I propose 4 different conversational systems that address some of the fundamental health & behavior change challenges. In Chapter 3 to address the intrinsic challenge of user boredom and engagement loss with repeated interactions - I propose a conversational system with value-based conversation topic personalization and diversification. In Chapter 4 to address the challenge of engaging users in mindful self-learning from their behavioral data - I propose conversational systems supporting structured reflection on physical activity and on professional development at work. In Chapter 5 to support health data collection, especially to improve user comfort in sensitive topics and understandability among low-literacy populations - I propose a system for conversational survey administration. Finally in Chapter 6, to lower the effort involved in designing good quality conversational systems, I propose a tool for automated conversion of form-based surveys to a more engaging conversational format. My work identifies and provides evidence for several benefits of the use of conversational interactions in health & behavior change. Among others, I demonstrate the benefits of increased engagement in interaction, improved motivation for performing activities, accessibility benefits related to familiarity, ease of use, comfort with sharing, and an ability to guide the users in the behavior change process via dialogue. I also identify several important challenges: perceptions of artificiality, managing high expectations of contextual knowledge, and social intelligence, as well as lower efficiency that could negatively affect the experience for some user groups. I further investigate the concrete links between conversational design elements and these benefits and challenges. My thesis demonstrates various design processes and automation techniques that can lower the effort of designing conversational experiences. As technology progresses conversational interactions can offer valuable support complimenting the existing automated tracking and the efforts of human health coaches. My work offers an important contribution to our understanding of how conversational interactions can play such a beneficial role
Stress Analytics in Education
During the years of college and university education students are exposed to different kinds of stress, especially during the difficult studying periods like final exams weeks or project deadlines. Stress on a long run is dangerous and can contribute to illness through its physiological effects or maladaptive health behaviors. Many students admit, or are self-aware, that they become stressed under different circumstances and have some clues about their potential stressor. Still, even for such students, the monitoring and awareness of stress are not systematic and based on subjective data, i.e. someone’s feelings. In our work we aim at providing means to students to become aware of the past, current and expected (objectively measured) stress and its correlation with their performance, to understand their stressors, to cope with and prevent stress- thus, to live healthier and happier lives and better organize their studies. 1
Personalized stress management : enabling stress monitoring with LifelogExplorer
Stress is one of the major triggers for many diseases. Improving stress balance is therefore an important prevention step. With advances in wearable sensors, it becomes possible to continuously monitor and analyse user’s behavior and arousal in an unobtrusive way. In this paper, we report on a case study in which users (21 teachers of a vocational school) were provided with wearable sensors and could view their arousal information put in the context of their life events during the period of four weeks using our software tool in an unsupervised setting. The goal was to evaluate user engagement and enabling of self-coaching abilities. Our results show that users actively explored their arousal data during the study. Further qualitative evaluation conducted with 15 of 21 users indicated that 12 of 15 users were able to learn about their stress patterns based on the information they obtained, but only 5 of them were able to come up with practical interventions for improving their stress balance on their own, while other users were of opinion that nothing can be done to reduce their stress, which suggests that self-coaching has some potential but there is need in further coaching support
A trust evaluation framework for sensor readings in body area sensor networks
This paper addresses a framework to evaluate trustworthiness of a Body Area Sensor Networks (BASN), in particular, of sensor readings. We show that such trustworthiness is to be interpreted with respect to a certain statement or goal; its evaluation is based on quality aspects derived from observations and opinions from others. We examine relevant quality aspects of sensor readings which correspond to potential deviating behaviors of sensors. We then look at how to derive such qualities from observations taking uncertainty into the evaluation as well as decay over time. We develop an extension of subjective logic for this purpose and we show how we can compute quality properties without storing long time series. We then demonstrate this for two examples, including Galvanic Skin Response (GSR) and Electrocardiography (ECG) sensed data.</p